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Abstract. For every e > and large enough fc we construct unsatisfiable fc-CNF 
formulas where every clause has fc distinct literals and every variable appears in 
at most (| + e) x clauses. Using a result of Berman, Karpinski and Scott we 
also show that our result is asymptotically best possible: every fc-CNF formula 

where every variable appears in at most 1 clauses is satisfiable. The 

determination of this extremal function is particularly important as it represents 
the value where the fc-SAT problem exhibits its complexity hardness jump: from 
having every instance being a YES-instance it becomes NP-hard just by allowing 
each variable to occur in one more clause. 

The asymptotics of other related extremal functions are also determined. Let 
l(k) denote the maximum number, such that every fc-CNF formula with each 
clause containing fc distinct literals and each clause having a common variable 
with at most l(k) other clauses, is satisfiable. We establish that the bound 
on l(k) obtained from the local lemma is asymptotically optimal, i.e., l(k) = 
(e- 1 + o(l))2 k . 

The constructed formulas are all in the class MU(1) of minimal unsatisfiable 
formulas having one more clause than variables and thus they resolve these as- 
ymptotic questions within that class as well. 
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1. Introduction 

The satisfiability of Boolean formulas is the archetypical NP-hard problem. Some- 
what unusually we define a /c-CNF formula as the conjunction of clauses that are 
the disjunction of exactly k distinct literals. (Note that most texts allow shorter 
clauses in a /c-CNF formula, but fixing the exact length will be important for us later 
on.) The problem of deciding whether a /c-CNF formula is satisfiable is denoted by 
A;- SAT, it is solvable in polynomial time for k = 2, and is NP-complete for every 
k > 3 as shown by Cook [3]. 

Papadimitriou and Yannakakis [18] have shown that /c-SAT is even MAX-SNP- 
complete for every k > 2. 

The first level of difficulty in satisfying a CNF formula arises when two clauses 
share variables. For a finer view into the transition to NP-hardness, a grading of the 
class of /c-CNF formulas can be introduced, that limits how much clauses interact 
locally. A /c-CNF formula is called a (k, s)-CNF formula if every variable appears 
in at most s clauses. The problem of satisfiability of (/c, s)-CNF formulas is denoted 
by (k, s)-SAT. 

Tovey [23] proved that while every (3, 3)-CNF formula is satisfiable (due to Hall's 
theorem), the problem of deciding whether a (3,4)-CNF formula is satisfiable is 
already NP-hard. Dubois [5] showed that (4, 6)-SAT and (5,11)-SAT are also NP- 
complete. 

Kratochvfl, Savicky, and Tuza [15] defined the value f{k) to be the largest integer 
s such that every (k, s)-CNF is satisfiable. They also generalized Tovey's result by 
showing that for every k > 3 (k,f(k) + 1)-SAT is already NP-complete. In other 
words, for every k > 3 the (k, s)-SAT problem goes through a kind of "complexity 
phase transition" at the value s = f(k). On the one hand the (k, f(k))-SAT problem 
is trivial by definition in the sense that every instance of the problem is a "YES"- 
instance. On the other hand the (k, f(k) + 1)-SAT problem is already NP-hard, so 
the problem becomes hard from being trivial just by allowing one more occurrence 
of each variable. For large values of k this might seem astonishing, as the value of 
the transition is exponential in k: one might think that the change of just one in 
the parameter should have hardly any effect. 

The complexity hardness jump is even greater: the problem of (/c, s)-SAT is also 
MAX — SNP-complete for every s > f{k) as was shown by Berman, Karpinski, 
and Scott [2] (generalizing a result of Feige [7] who showed that (3, 5)-SAT is hard 
to approximate within a certain constant factor). 

The determination of where this complexity hardness jump occurs is the topic of 
the current paper. 

For a lower bound the best tool available is the Lovasz Local Lemma. It does 
not deal directly with number of occurrences of variables, but rather with pairs of 
clauses that share at least one variable. We call such a pair an intersecting pair of 
clauses. A straight forward consequence of the lemma states that if every clause 
of a /c-CNF formula intersects at most 2 k /e — 1 other clauses, then the formula is 
satisfiable. It is natural to ask how tight this bound is and for that Gebauer et al. 
[9] define l{k) to be the largest integer number satisfying that whenever all clauses 
of a /c-CNF formula intersect at most l(k) other clauses the formula is satisfiable. 
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With this notation the Lovasz Local Lemma implies that 

2 k 



l{k) > 



1. 



The order of magnitude of this bound is trivially optimal: l(k) < 2 k — 1 follows 
from the unsatisfiable /c-CNF consisting of all possible /c-clauses on only k variables. 

In [9] a hardness jump is proved for the function I: deciding the satisfiability of 
/c-CNF formulas with maximum neighborhood size at most max{l(k) + 2, k + 3} is 
NP-complete. 

As observed by Kratochvfl, Savicky and Tuza [H] the bound (ll.lj) immediately 
implies 
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From the other side Savicky and Sgall [21] showed that f(k) = O yk ' 74 

This was improved by Hoory and Szeider [12] who came within a logarithmic factor: 

f(k) = O (logk • ^-V Recently Gebauer |T0j showed that the order of magnitude 

of the lower bound is correct and f(k) = Q{2 k /k). This construction also improved 
the trivial upper bound on l(k) showing l(k) < -^2 k for infinitely many k. 

In our main theorem we determine the asymptotics of both f{k) and l(k). We 
show that the lower bound (jl.ip on l{k) is asymptotically tight but its consequence 
(|1.2p is not tight, it can be strengthened by a factor of 2. 



Theorem 1.1. 
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The lower bound was achieved via the lopsided version of the Lovasz Local Lemma 
by Berman, Karpinski and Scott [2], although they did not calculate the asymptotic 
behavior of their bound. 

Since the (lopsided) Lovasz Local Lemma was fully algorithmized by Moser 
and Tardos [TT] we now have that not only every (k, s)-CNF formula for s = 
[2 k+1 /{ek)\ — 1 has a satisfying assignment but there is also an algorithm that 
finds such an assignment in probabilistic polynomial time. Moreover, for just a 
little bit larger value of the parameter s one cannot find a satisfying assignment 
efficiently simply because already the decision problem is NP-hard. 

1.1. The class MU(1). The function f(k) is not known to be computable. In order 
to still be able to upper bound its value, one tries to restrict to a smaller /simpler 
class of formulas. When looking for unsatisfiable (k, s)-CNF formulas it is naturally 
enough to consider minimal unsatisfiable formulas, i.e., unsatisfiable CNF formulas 
that become satisfiable if we delete any one of their clauses. The set of minimal 
unsatisfiable CNF formulas is denoted by MU. As observed by Tarsi (c.f. [l]), all 
formulas in MU have more clauses then variables, but some have only one more. The 
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class of these MU formulas, having one more clauses than variables is denoted by 
MU(1). This class has been widely studied (see, e.g., pQ, [4], [13], [16], [23]). Hoory 
and Szeider [IT] considered the function fi(k), denoting the largest integer such 
that no (k, ft(/c))-CNF formula is in MU(1). They showed that fi(k) is computable 
and is an upper bound for f(k). Their computer search determined the value of 
ft(fc) for small k: ft (5) = 7, ft (6) = 11, ft (7) = 17, ft (8) = 29, ft (9) = 51. Via 
the trivial inequality f(k) < ft (A;), these are the best known upper bounds on f(k) 
in this range. In contrast, even the value of /(5) is not known. 

It is also not known whether f(k) = ft(fc) for every k. Our upper bound con- 
struction from Theorem 11.11 does reside in the class MU(1) and hence shows that 
f{k) and ft(fc) are equal asymptotically: f(k) = (1 + o(l))ft(fc). 

Scheder [22] showed that for almost disjoint fc-CNF formulas (i.e. CNF-formulas 
where any two clauses have at most one variable in common) the two functions 
are not the same. That is, if f(k) denotes the maximum s such that every almost 
disjoint (k, s)-CNF formula is satisfiable, for k large enough every unsatisfiable 
almost disjoint (k, f(k) + 1)-CNF formula is outside of MU(1). 

2. Constructing binary trees and MU(1) formulas 

The structure of MU(1) formulas is well understood and it is closely related to 
binary trees. In particular, given any finite rooted binary tree T with the property 
that each non-leaf vertex has exactly two children we associate with it certain CNF 
formulas. We start with assigning distinct literals to the vertices, assigning the 
negated and non-negated form of the same variable to the two children of any non- 
leaf vertex. We do not assign any literal to the root. For each leaf of T consider a 
clause that is the disjunction of some literals along the path from the root to that 
leaf and consider the CNF formula F that is the conjunction of one clause for each 
leaf. Clearly, F is unsatisfiable and it has one more clauses than the number of 
variables associated with T. As proved in [1] F is a MU(1) formula if and only if 
all literals associated to vertices of T do appear in F, furthermore every formula in 
MU(1) can be obtained from a suitable binary tree this way. 

We bound f(k) through bounding fi(k), that is by finding (k,d)-CNF formulas 
in MU(1). By the above correspondence between MU(1) formulas and trees this 
involves constructing a rooted binary tree and selecting a fc-subset of the vertices 
on every root-leaf path. We want to concentrate on the tree building so we further 
restrict the class of fc-CNF formulas in MU(1) that we are considering to formulas 
obtained from binary trees by selecting on every root-leaf path the k vertices farthest 
from the root. Accordingly we define: 

A (k, d)-tree is a finite rooted binary tree with all non-leaf vertices having exactly 
two children that satisfy that (i) there is no leaf in distance less than k from the 
root and (ii) for every vertex v there is at most d leaves among the descendants of 
v in distance at most k of v. 

Let ft(fc) be the largest integer d such that no (k, cQ-tree exist. We clearly have 

Lemma 2.1. ft(fc) < ft (A;) for every k. 

Proof: Let d = ft(/e) + 1, T a (k, d)-tree and consider the associated unsatisfiable k- 
CNF formula F as explained above. Note that condition (i) in the definition ensures 
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that each root-leaf path is long enough and so F exists, while condition (ii) makes 
F a (k, <f)-CNF. The existence of F proves f(k) < d, so we have f(k) < f2(k). To 
prove fi(k) < /2(A) we need a (k,d)-CNF in MU(1). Unfortunately, F does not 
necessarily lie in MU(1). This is because if some literal associated to a vertex v of T 
does not appear in F, then F is not even in MU. This happens if and only if there 
is no leaf in distance less than k among the descendants of the non-leaf vertex v. 
In this case the subtree of T rooted at v is also a (k, <i)-tree, so if we pick T to be 
a minimal (k, d)-tree, then the corresponding (k, <i)-CNF formula is in MU(1). □ 

Our goal is to bound /2(A)) (and hence f\{k) and f{k)) by constructing (k, d)-trees 
for suitable values of k and d. 

A (k,d)-vector is a (k + l)-tuple x = (xo, . . . ,Xf.) of non-negative integers with 
\x\ < d. Here \x\ is defined as \x\ = X^/=o x i' wriUe the weight of x is Ylj=o x j/^ ■ We 
call a finite rooted binary tree a (A;, x)-tree if all non-leaf vertices have exactly two 
children and (i') there is at most Xj leaves in distance exactly j of the root for j = 
0, . . . , k and (ii) for every vertex v there is at most d leaves among the descendants 
of v in distance at most k of v. We call a (k, d)- vector x (k, d)- constructible, or 
constructible if k and d are clear from the context, if a (k, d, x)-tree exists. 

We will need the following lemma, which is a dual version of Kraft's inequality 

M- 

Lemma 2.2. For a (k,d)-vector x a (k,d,x)-tree of depth at most k exists if and 
only if the weight of x is at least 1. In particular, the (k,d)-vectors of weight at 
least 1 are constructible. 

Note that to obtain this form of Kraft's inequality from the standard version (a 
detailed explanation can be found e.g., in [20], Theorem 2.1.2.) one has to use the 
simple observation that if the weight of x is strictly above 1, then there is another 
(k, d)- vector x' coordinate-wise dominated by x, whose weight is exactly 1. A very 
similar form of the inequality was also used in |10j . 

Let us fix the positive integers k, d and I. We will not show the dependence on 
these parameters in the next definitions to simplify the notation, although d' , E, 
C r and C* depend on them. We let d' = d(l - l/(2 l - 1)). For a (k, cf)-vector 
x = (xq, . . . , Xk) we define E{x) = {\x\/2\ , |_a?2 / 2j , . . . , |_£fc/2j , [d' /2\). We denote 
by E m (x) the vector obtained from x by m applications of the operation E. For 
/ < r < k we define C r (x) to be the (k + l)-tuple starting with r + 1 — I zero 
entries followed by [xj/(2 l — 1)J for j = r + 1, . . . , k, followed by [d' /2 l ~ : > \ for j = 
0, 1, — 1 and let C*(x) be the (k + l)-tuple starting with Xj for j = 1,1 + 1, ... ,r 
followed by k — r + I zeros. 

Note that for the following lemma to hold we could use d instead of d! in the 
definition of E and also in most places in the definition of C. The one place where 
we cannot do this is the entry [d f /2 l \ of C r (x) right after [xk/(2 l — 1)J. If we used 
a higher value there, then one of the children of the root of the tree constructed in 
the proof below would have more than d leaves among its descendants in distance 
at most k. We use d' everywhere to be consistent and provide for the monotonicity 
as used in the proof of Theorem 12.11 

Lemma 2.3. Let k, d and I be positive integers and x a (k,d)-vector. 
(a) E(x) is a (k,d)-vector. If E(x) is constructible, then so is x. 
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(b) For I < r < k both C r (x) and C*{x) are (k,d) -vectors. If both of these vectors 
are constructible and \C*(x)\ < d/2 l , then x is also constructible. 

Proof: (a) We have |-E(x)| < \x\/2 + d'/2 < d, so E(x) is a (k, d)- vector. If there 
exists a (k, d, E(x))-tree, take two disjoint copies of such a tree and connect them 
with a new root vertex, whose children are the roots of these trees. The new binary 
tree so obtained is a (k, d, x)-tree. 

(b) The sum of the first k+l-l entries of C r {x) is at most |x|/(2'-l) < d/(2 l -l), 
the remaining fixed terms sum to less than d' = d(l — 1/(2' — 1)), so |C r (x)| < d. 
We trivially have |C*(x)| < \x\ < d, so both C r (x) and C*{x) are (k, (i)-vectors. 

Let T be a {k, d, C r (x))-tree and T* a (k, d, C*(x))-tree. Consider a full binary 
tree of depth I and attach T* to one of the 2 l leaves of this tree and attach a separate 
copy of T to all remaining 2 l — 1 leaves. This way we obtain a finite binary tree 
T' with all non-leaf vertices having exactly two children. To check condition (i') 
of the definition notice that no leaf of T' is in distance less than / from the root, 
leaves in distance I < j < r are all in T* and leaves in distance r < j < k are all in 
the 2 l — 1 copies of T. Condition (ii) we have to check only for vertices in distance 
1 < j < I from the root. There are two types of vertices in distance j. One of them 
has 2 l ~i copies of T below it, the other has one less and also T*. It is a straight 
forward calculation to bound the number of leaves below and in distance at most k 
in either case. Instead of giving all the details of the computation we mention that 
the vertex of the first type in distance j = 1 of the root makes us use the specific 
value of d' < d we use and the vertices of the second type make it necessary to 
assume a bound on |C*(x)|. □ 

Armed with the last two lemmas we are ready to prove the main result of this 
paper. 



Theorem 2.1. 



ofc+i /2 k+1 \ 



We show that (k, <i)-trees exist for large enough k and with d = [2 k+1 /(ek) + 
100 • 2 k+ V^ 3/2 J • 

We set / = [l°gfc/2j and s = 21. Here log denotes the binary logarithm, so 
2 l « \fk. We define the (k, <i)-vectors = (xq\...,x^) recursively. We start 
with = E k ~ s (z), where z denotes the all zero (k, <i)-vector. For t > we 
define x^ t+1 ^ = E Tt ~ s ~ l (C rt (x^)), where rj is the is the smallest index in the range 
s + / < rt < k with X^jLo ^ 2~ l . At this point we may consider the sequence 

of the (k, <i)-vectors x^ end whenever the weight of one of them falls below 2~ l and 
thus the definition of r t does not make sense. But we will establish below that this 
never happens and the sequence is infinite. 

Notice first, that we have x^p = for all t and < j < s, while the entries x^p 
for s < j < k are all obtained from d! by repeated application of dividing by an 
integer (namely by 2 l — 1 or by a power of 2) and taking lower integer part. Using 
the simple observation that |_|_°J 1 1\ = L a /iJ ^ a i s rea ^ an d j is a positive integer we 
can ignore all roundings but the last. This way we can write each of these entries 
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in the form 



d' 




d' 


2 i (2'-l)J 







1 + 



for some non-negative integers i and 



j. Using the values a = 1 + 1/(2' — 1) and qt = r t — s (this is the amount of "left 
shift" between x t and x t+1 ) we can give the exponents explicitly. 



(2.1) sV 



(t) 



2k+i-j 



for all s < j < A; and all i, where c(t,j) is the largest integer < c < i satisfying 
E*=t- C Qi<k- j. We define c(t, j) = if q t ~i > k-j. 

The formal inductive proof of this formula is a straight forward calculation. What 
really happens here (ignoring the rounding) is that each entry enters at the right 
end of the vector as d! /2 and is divided by 2 every time it moves one place to the 
left (application of E) but when it moves I places to the left through an application 
of C n it is divided by 2 l — 1 instead of 2 l so it gains a factor of a. The exponent 
c(t, j) counts how many such factors are accumulated. If the "ancestor" of the entry 
Xj was first introduced in x^'\ then c(t,j) = t — t'. 

We claim next that c(t,j) and Xj increases monotonously in t for each fixed 
s < j < k, while qt decreases monotonously with t. We prove these statements by 
induction on t. We have c(0, j) = for all j, so c(l, j) > c(0, j). lfc(t+l,j) > c(t,j) 
for all j, then all entries of dominate the corresponding entries of j« by 

Equation (I2.ip . If xf +1 ^ > xf for all j, then we have rt+\ < rt by the definition of 
these numbers, so we also have qt+\ < qt- Finally, if qi is decreasing for % < t + 1, 
then by the definition of c(i,j) we have c{i + 1, j) > c(i,j) for i < t + 1. 

The monotonicity just established also implies that the weight of is also 
increasing, so if the weight of x^ is at least 2~ l , then so is the weight of all the 
other x®, and thus the sequence is infinite. The weight of x^ is 



> 



E 

j=s+l 
k 

E 

i=s+i 



I * 



2J 

d! 

2*: + l-j 



> (fc-s) 



2' 

d / _ 
' 2 fc+i 
1 1 



> (,- s) ^p-2^, 

where the last inequality follows from d > 2 k+1 /(ek) and the last term tends to e _1 
as k tends to infinity, so it is larger than 2~ l for large enough k. 

We have just established that the sequence x^ of (k, (i)-vectors is infinite and 
coordinate- wise increasing. Since |x^| < d and it must strictly increase before the 
sequence stabilizes, the sequence must stabilize in at most d steps. So zw = x = 
(xq, . . . ,Xk) for t > d. This implies that qt also stabilizes with qt = q for t > d. 
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Equation (12. 1|) as applied to t > d + k simplifies to 
(2.2) Xj 



d ' L(*-i)/«J 



2fc+l-j 



Recall that q = rt — s, and r^ is defined as the smallest index in the range 
s + I < r < k with Y^=o x j — 2 ~ l • Thus we have q > I. We claim that equality 
holds. Assume for contradiction that q > I Then by the minimality of rt we must 
have 



s+q-l 
j=0 



s+q-l d' L(fc-j)/g] 
y-v |_2 fe +l- ■ 



> 



s+q-l d> (k-i)fg-l 
\ 2fc+l-j u 



^ 2? 

j=s+l 



> (9-i)^r« fe/g - 4 -2- 2Z . 

In the last inequality we used s = 2Z and that i<^±2 = l + |<l + ^ = 3. This 

q q q i 

inequality simplifies to 



,2 fcH 



(2.3) 2- z (l + 2-V^7- > {q-l)a k/q . 

a 

Simple calculus gives that the right hand side takes its minimum for q > 2 
between c — 1 and c — 2 for c = kin a and this minimum is more than (c — 3)e. 
Using a = 1 + 1/(2' - 1) > e 2 "' we have c > fc/2 { . So Inequality ([13]) yields 



-'(l + 2-VV>^- 3e - 



Substituting our choice for d, d', a and I (as functions of fc) shows that this last 
formula does not hold for large k. The contradiction proves q = I as claimed. 

We finish the proof of the theorem by establishing that the (k, <i)-vectors x^> are 
constructible. We prove this statement by downward induction for t. We start with 
t = d, where = x and by Equation (|2.2p its weight is readily computable. Here 
we omit the details of this straight forward calculation but state that the weight is 
above 1. So by Lemma 12.21 x is constructible. 

Now assume that is constructible. Recall that = E n ~ s ~ l (C rt (x^)) , 

so by (the repeated use of) Lemma 12.31 part (a) C rt (x^) is constructible. By part 
(b) of the same lemma is also constructible (and thus the inductive step is 
complete) if we can (i) show that C* (x^) is constructible and (ii) establish that 

|C* (x®)\ < d/2 l . For (i) we use the definition of r t : YJj=o x f l 2j > 2 ~ l - But 
the weight of C* t (x^) is ^2^=iXj /2P~ l , so the contribution of each term with 
j > I is multiplied by 2 l , while the missing terms j < I contributed zero anyway 
as I < s. This shows that the weight of C* is at least 1 and therefore Lemma 12.21 
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proves (i). For (ii) we use monotonicity to see r% < vq and Equation (|2.ip to 
see that tq < bk/2 l for large enough k. Then we use monotonicity again to see 
|C*(xW)| = EjU+i 2 ^ - ^j=s+i x J- Fin ally we use Equation $L2§ to see that 
this number is well below d/2 1 for large enough k. This finishes the proof of (ii) and 
hence the inductive proof that is constructible for every t. 

As = E k ~ s (z) is constructible Lemma 12,31 (a) implies that z is also con- 
structible, so there exists (k, d, z)-tree T. As z is the all zero vector, T must also 
be a (k, cZ)-tree. □ 

3. Proof of Theorems 11.11 and 11.21 

Proof of Theorem \1.2c The lower bound part of the theorem is an immediate and 
standard application of the the Lovasz Local Lemma. For the upper bound we 
need unsatisfiable /c-CNF formulas with each clause intersecting just a few other 
clauses. Note that the formula constructed in the previous proof is not necessarily 
good enough. We need not only that each of the variables appears at most d times, 
but the variables must also be balanced: no literal should appear significantly more 
than d/2 times. Fortunately the existence of such formulas follows easily from 
Theorem 12 . 1 1 and Lemma [3.11 below . We remark here that we do not have to rely on 
the fact that the proof of Theorem 12.11 established not only the existence of (k,d)- 
trees for d = (1 + 0(k~ 1 ^ 2 ))2 k+1 / (ek) but also the existence (k, d, z)-trees, where z 
is the all zero vector. The latter readily follows from the former, simply take two 
copies of a (k, d)-tree and join them by a new root vertex whose children are the 
roots of the two copies. The tree so obtained is a (k, d, z)-tree. □ 

Lemma 3.1. Let T be a (k, d, z)-tree, where z is the all zero (k, d)-vector. Then T is 
also a (k + l,2d)-tree and in the unsatisfiable (k + 1)- CNF formula F corresponding 
to T every literal appears at most d times. Furthermore every clause of F intersects 
at most d(k + 1) other clauses. 

Proof: The first statement is immediate from the correspondence between trees and 
formulas. For the last statement note that a formula so obtained from a tree has the 
property that no two of its clauses are positively correlated: if they intersect they 
have a variable that appears positively in one and in negated form in the other. 
Therefore each other clause that a given clause C intersects has a literal that is 
opposite to a literal appearing in C. □ 

Berman, Karpinski and Scott [2] noticed that the lower bound (|l-2j) on f{k) 
that is implied by the Lovasz Local Lemma can be improved using the lopsided 
version of the lemma. They do not compute the asymptotic implications of their 
bound because they believe that "it should be possible to obtain better bounds by 
employing techniques from hypergraph coloring (see, for instance, Radhakrishnan 
and Srinivasan |19j)." 

As it turns out their lower bound improves (II. 2\\ by a factor of 2 and is asymptot- 
ically tight. It is proved with the lopsided version of the local lemma that allows for 
"ignoring" that a pair of clauses in a CNF formula intersects if no variable appears 
negated in one and in non-negated form in the other. This reduces the number of 
intersections to consider and if each variable appears at most d/2 times negated 
and at most d/2 times non-negated in a (fc, cQ-CNF, then each clause intersects at 



THE LOCAL LEMMA IS TIGHT FOR SAT 



9 



most kd/2 other clauses instead of the original k(d — 1) bound. If, however, some 
of the variables are unbalanced and say appear almost d times negated but only a 
handful of times non-negated, then there might be a few clauses that still intersect 
almost kd other clauses. This situation is handled by the non- uniform version of 
the local lemma and by making these variables biased. Surprisingly, however, one 
has to make a variable biased in a very counter-intuitive direction: the more the 
variable appears negated the less likely we set it false. (This strange direction of 
the bias is not apparent from [2], probably due to a typo.) 



Theorem 3.1 ( |2| 



We have f{k) > d if there exists x £ (0, 1) with 
x iA((l_ x )M/2l +(1 _ x) Ld/2j)> L 



Proof of Theorem \l.l\ Let k be an arbitrary positive integer, d 



set x 



We have 



2 fc + 1 
ek 



1 and 



_ X )W*\ + (1 _ x) \_d/2\) > 2x l /\l - X) d ' 2 > eV*(l - > 1. 

Thus Theorem 13.11 establishes the lower bound part of the theorem. The upper 
bound follows from Theorem 12.14 Lemma |2. II and the fact that f(k) < f\{k). □ 



4. On the size of unsatisfiable formulas 

By the size of a tree we mean the number of its leaves and by the size of a 
CNF formula we mean the number of its clauses. With this notation the size of a 
(k, d)-tree and the size of the corresponding (k, (i)-CNF in MU(1) are the same. 

When proving Theorem 12.11 we constructed (/c, cf) -trees for d ~ ^-r-. Their size 
and therefore the size of the corresponding (k, ci)-CNF in MU(1) is at most 2 h , 
where if h is the depth of the tree: the largest root-leaf distance. In fact, the size 
of the trees we constructed are very close to this upper bound. Therefore it makes 
sense to take a closer look at the depth. 

Let us associate with a vertex v of a (k, d)-tree the (k, <i)-vector (xq, . . . ,Xk), 
where Xj is the number of leaves in distance j from v among the descendants of 
v. A minimal (k, (i)-tree has no two vertices along the same branch with identical 
vectors, so the depth of a minimal (k, d)-tree is limited by the number of (k, d)- 
vectors, less than (d + l) fe+1 . For d = f(k) + 1 this is 2 e ^\ The same bound 
for minimal (k, cf)-CNF formulas in MU(1) is implicit in [llj. There is numerical 
evidence that the size of the minimal {k, f{k) + l)-tree and the minimal (k, d)-CNF 
in MU(1) might indeed be doubly exponential in k (consider the size of the minimal 
(7, 18)-tree and the minimal (7, 18)-CNF in MU(1) mentioned below). 

A closer analysis of the proof of Theorem 12.11 shows that the depth of the (A;, d)- 
tree constructed in it is at most kd. While this is better than the general upper 
bound above it still allows for trees with sizes that are doubly exponential in k. 

This depth can, however, be substantially decreased if we allow the error term in 
d to slightly grow. If we allow d = (1 + ^)^jr- for a fixed e > 0, then a more careful 
analysis shows that the depth of the tree created becomes 0(k). This bounds the size 
of the tree and the corresponding formula by a polynomial of d. Here we just sketch 
the argument for this statement and do not give a formal proof. The main point is 



10 



H. GEBAUER, T. SZABO, AND G. TARDOS 



to note that we can stop the recursion defining x^ in the proof of Theorem 12.11 as 
soon as the weight of one of these vectors reaches 1 and to give an accurate estimate 
of how fast this will happen. For this we define a continuous two-variable function 
F(T,X) on T > and X 6 [0, 1] and use it to approximate the vectors i«, Let 

Qt = Yll=o Qi f° r the integers qt defined alongside = (x^ , . . . , x^) in the same 
proof. We show that 

„(*) ~ d F (9i L 

i ~ 2 k + l -i V k ' k 

The function F is given by the boundary conditions F(0,X) = 1 for X £ [0,1], 
F(T, 1) = 1 for T > and by the differential equation 

D^_ X) F(T : X) = PF(T,X)F(T,0), 

where ^n-i) stands for the derivative in the direction (1,-1), that is A — -J^ 
and P = dk/2 k+1 is a constant. Analyzing the differential equation one finds that 
if P < e _1 , then F(T,X) remains bounded by e l ~ x , but for f3 > e~ l it reaches 
infinity fast. Naturally F fails to approximate the discrete process at this point 
of singularity, but somewhat before that point we find a corresponding a;W vector 
with large weight. The corresponding value Qt is 0(k) and thus the the (k,d)-tree 
obtained will have depth 0(k) too. 

Let us define fi(k,d) for d > fi(k) to be the size of the smallest (/c,(f )-CNF 
in MU(1) and let f2{k,d) stand for the size of the smallest (k, d)-tree, assuming 
d > f2(k). While fi(k, f\(k) + 1) and similarly /2( k, f2(k) + 1)) are probably doubly 
exponential in k, for slightly larger values of d f\(k,d) and f2(k,d) are polynomial 
in d (and thus simply exponential in k). 

Finally we mention the question whether fi(k) = f2(k) for all k. In other words, 
we ask whether we lose by selecting the k vertices farthest from the root when 
making a MU(1) /c-CNF formula from a binary tree. It should be mentioned that 
f(k) = fi(k) is also open, but fi(k) = f^{k) seems to be a simpler question as 
both functions are computable. Computing their values up to k = 7 we found 
these values agreed. To gain more insight we computed the corresponding size 
functions too and found that fi(k,d) = f2(k,d) for k < 6 and all d > fi(k). But 
for k = 7, where /i(7) = /2(7) = 17 we found the following surprising result: 
fx(7,d) = / 2 (7,d) for 19 < d < 25, but /i(7,18) = 10,197,246,480,846, while 
/ 2 (7,18) = 10,262,519,933,858. In other words, there exist (7, 18)-trees, but the 
smallest (7, 18)-CNF formulas cannot be obtained from them. Does this indicate 
that all other equalities are coincidences and f± and / 2 will eventually diverge? 

A related algorithmic question is whether the somewhat simpler structure of 
(k, (i)-trees can be used to find an algorithm computing f2(k) substantially faster 
than the algorithm of Hoory and Szeider |11| for computing fi(k). Such an algorithm 
would give useful estimates for fi(k) and also f(k). At present we use a similar 
(and similarly slow) algorithm for either function. 
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